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Abstract 

Recent results in extreme value theory suggest a new technique for statistical estimation of 
distribution tails {Embrechts et al, 1997), based on a limit theorem known as the Gnedenko- 
Pickands-Balkema-de Haan theorem. This theorem gives a natural limit law for peak-over- 
threshold values in the form of the Generalized Pareto Distribution (GPD), which is a family 
of distributions with two parameters. The GPD is successfully applied in a number of 
statistical problems related to finance, insurance, hydrology, and other domains. Here, we 
apply the GPD approach to the well-known seismological problem of earthquake energy 
distribution described by the Gutenberg-Richter seismic moment-frequency law. We analyze 
shallow earthquakes (depth h < 70 km) in the Harvard catalog over the period 1977-2000 in 
18 seismic zones. The whole GPD is found to approximate the tails of the seismic moment 
distributions quite well above M > 10 24 dyne-cm (i.e., moment-magnitudes larger than 
m w =5.3) and no statistically significant regional difference is found for subduction and 
transform seismic zones. We confirm that the b-value is very different (b=1.50±0.09 
corresponding to a power law exponent close to 1) in mid-ocean ridges compared to other 
zones (b=1.00±0.05 corresponding to a power law exponent close to 2/3) with a very high 
statistical confidence. We propose a physical mechanism for this, contrasting slow healing 
ruptures in mid-ocean ridges with fast healing ruptures in other zones. The GPD can as well 
be applied in many problems of seismic risk assessment on a regional scale. However, in 
certain cases, deviations from the GPD at the very end of the tail may occur, in particular for 
large samples signaling a novel regime. Such deviations are detected here in the sample 
containing earthquakes from all major subduction zones (sample size of 4985 events). We 
propose a new statistical test of significance of such deviations based on the bootstrap 
method. The number of events deviating from the tails of GPD in the studied data sets (15-20 
at most) is not sufficient for determining the functional form of those deviations. Thus, it is 
practically impossible to give preference to one of the previously suggested parametric 
families describing the ends of tails of seismic moment distributions: Gamma (Kagan, 1997), 
modified Gutenberg-Richter (Bird et al., 2000), Pareto with a crossover point (Pacheco et al., 
1992; Sornette et ah, 1996), or stretched exponential (Laherrere and Sornette, 1998). 
Possible statistical inference in such situation is discussed. 



INTRODUCTION 

Most complex systems around us exhibit rare and sudden transitions that occur over time 
intervals that are short compared to the characteristic time scales of their posterior evolution. Such 
extreme events express more than anything else the underlying "forces" usually hidden by almost 
perfect balance and thus provide the potential for a better scientific understanding of complex 
systems. These crises have fundamental societal impacts and range from large natural catastrophes 
such as earthquakes, volcanic eruptions, hurricanes and tornadoes, landslides, avalanches, lightning 
strikes, meteorite/asteroid impacts, catastrophic events of environmental degradation, to the failure of 
engineering structures, crashes in the stock market, social unrest leading to large-scale strikes and 
upheaval, economic draw-downs on national and global scales, regional power blackouts, traffic 
gridlock, diseases and epidemics, etc. The long-term behavior of many of these complex systems is 
often controlled in large part by these rare catastrophic events: the universe was probably born during 
an extreme explosion (the "big-bang"); the nucleosynthesis of important atomic elements 
constituting our matter results from the colossal explosion of supernovae; the typical largest 
earthquake in California repeating about once every two centuries accounts for a significant fraction 
of the total tectonic deformation; landscapes are more shaped by the "millennium" flood that moves 
large boulders rather than the action of all other eroding agents; the largest volcanic eruptions lead to 
major topographic changes as well as severe climatic disruptions; evolution is characterized by 
phases of quasi-static behavior interrupted by episodic bursts of activity and destruction; financial 
crashes can "evaporate" in an instant trillions of dollars; political crises and revolutions shape the 
long-term geopolitical landscape; even our personal life is shaped on the long run by a few key 
"decisions/happenances"'. 

Three major outstanding scientific questions related to extreme events are: 1) can the statistical 
distributions of extreme events be characterized? Can it be shown to be the smooth extrapolation of 
the distribution of smaller events or are extreme events "outliers"? 2) If extreme events are outliers, 
what are the physical mechanisms responsible for their occurrence? How may such large-scale 
patterns of catastrophic nature evolve from a series of interactions on the smallest and increasingly 
larger scales? 3) Are there specific precursory signatures? Do extreme events present a degree of 
predictability not shared by their small counterparts? 

Here, we address the first question in details with the motivation that, if one can model properly 
the distribution, one can in principle predict at least the frequency of extreme events. However, there 
is a fundamental question that extreme events may not belong to the same distribution function as the 
small and intermediate events, and may thus be "outliers". It is thus of special importance to develop 
statistical methods that analyze specifically as closely as possible the range of its extreme values or 
the tail of the distributions rather than the whole of the distributions. 

Fitting parametric families of distributions to experimental data is a classical statistical problem. 
There exists a large collection of well-known standard families: Gaussian, Gamma, Weibull, Pareto 
and many others. Sometimes, there are physical grounds to choose a particular parametric family. For 
instance, when one studies time intervals between events, it is natural to use the exponential 
distribution if a random (Poissonian) flow of events is expected (Sornette and Knopoff, 1997). A 
Gaussian distribution is often assumed when the conditions of applicability of the Central Limit 
Theorem are satisfied. Unfortunately, such situations are seldom: usually the investigator has little 
idea if any about physical mechanisms determining the distribution of the observed data. In addition, 
the Central Limit Theorem is not adapted to the very nature of the problem of fitting the large and 
extreme deviations away from the center of the distribution because it does not apply to the tails of 
distributions of finite sample sizes (see e.g. {Frisch and Sornette, 1997; Sornette, 2000) and 
references therein). It is thus natural to consider the problem of fitting of a parametric distribution 
only to the extreme range of the data. 

For this purpose, we can rely on an important limit theorem characterizing the so-called Peak- 
Over-Threshold (POT) distributions, discussed in more detail below. In a nutshell, the POT refers to 
the distribution of events conditioned on being larger than a predetermined threshold. This theorem 
states that, under some general conditions, this limit distribution is the Generalized Pareto 
Distribution (GPD), which depends on only two parameters (shape and scale). This theorem thus 
provides statistical grounds to fitting distributions only in the tail. 
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We study the problem of tail fitting by the GPD and some of its new aspects concerning 
possible deviations from the GPD at the very end of tails of distributions. The statistical technique 
presented here is illustrated using the well-known seismological problem of earthquake energy 
distribution. A new evidence of spatial homogeneity of shallow earthquake distributions in 
subduction zones is demonstrated and confirm known results in this domain [Gutenberg and Richter, 
1954; Main and Burton, 1984; Main, 1992, 1996; Cornell, 1994; Sornette and Sornette, 1999; Kagan, 
1994, 1997, 1999; Romanowicz and Rundle, 1993; Romanowicz, 1994; Okal and Romanowicz, 1994; 
Pacheco and Sykes, 1992; Pacheco et ah, 1992; Scholz, 1994a, 1994b]. We confirm that the b-value 
is very different (b=1.50±0.09) in mid-ocean ridges compared to other zones (b=1.00±0.05) with a 
very high statistical confidence and propose a physical mechanism contrasting "crack-type" rupture 
with "dislocation-type" behavior. We also discuss possible problems with this result and contrast it 
with those obtained recently by Bird et al. (2000). The main interest of seismologists concerns the 
very end of tails of seismic moment distributions. Several parametric families (Gamma distributions 
[Main and Burton, 1984; Main, 1996; Kagan, 1994, 1997], Pareto distributions with a crossover point 
[Sornette et al.], Weibull distributions [Laherrere and Sornette, 1998]) were suggested for earthquake 
energy distribution including the tail range, but none of these models is generally accepted. A 
detailed study of this problem leads us to a conservative conclusion: none of the suggested laws is 
preferable because of a very small number of observations in the extreme range. In other words, these 
different families of distributions are undistinguishable with respect to the available data. 
Nevertheless, some cautionary statistical recommendations can be suggested, which are discussed in 
the last section. 

THE GENERALIZED PARETO DISTRIBUTION 

We first summarize useful known facts about GPD (see [Embrects et al., 1997; Bassi et ah, 
1998] for more details). For this, we consider a sample of iid random values (rv) and 
denote its maximum as max N : 

max N = max (x; ,...,x N ). 

The most important results in Extreme Value Theory refer to distribution functions (DF) F(x) whose 
normalized maxima have some non-degenerate limit distribution. For such DF, there exists some 
normalizing constants a N ,b N such that 

( max N - b N ) / a N =>Y 

where the arrow means that the left-hand- side tends to Y in distribution and 7 is a random variable 
with some non-degenerate DF. All DF F(x) whose normalized maxima have as a limit distribution a 
particular DF H(x) constitute the Maximum Domain of Attraction of the DF H(x), denoted MDA(H). 
It can be shown that all possible non-degenerate limit DF belong (up to a scale factor and a shift) to a 
single one-parameter family of DF: 

[ exp(-(l +£x) -1/| ), -oo<^<+oo, £*0, l+5x>0, 
(la) H.(x)= I 

[ exp( - exp( -x )), £, = 0, - °° < x < °°. 

Note that the Gumbel family exp(- exp( -x )) indeed corresponds to taking the limit £ = of the first 
Frechet+ Weibull family exp( - (1 + ^x )~ 1/ ^). H^(x) is the standard Generalized Extreme Value 
Distribution. 
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The class MDA(H^) is very broad. It includes in particular such distributions as Gaussian, 
Pareto, and Weibull. Necessary and sufficient conditions that ensure that F(x) belongs to MDA(H^) 
can be found in [Embrechts et ah, 1997]. The MDA(H^) has a number of important applications. Its 
importance stems from the Gnedenko-Pickands-Balkema-de Haan theorem (G-P-B-H theorem) 
which describes how the distribution of the large events conditioned to be larger than some threshold 
can be characterized by the Generalized Pareto Distribution (GPD). The GPD is derived from the 
distribution H^(x) of the largest value given by (1), such showing the link with Extreme Value 
Theory: 

(lb) G(x/£,s) = 1 + ln( H^(x / s)) = 1 -(1 +£x/sV 1/! % 

where the two parameters s) are such that - °° < £, < +°° and s > 0. For ^ > , x > and for ^ < 0, 
< x < - s / 

In order to state the G-P-B-H theorem, we define the right endpoint x F of a DF F(x) as 
x F = sup{ x: F(x) < 1 } , 

which can often be considered to be infinite in many practical applications, and the excess 
distribution F u (x) 

F u (x) = P{X-u<x|x>u}, x > 0. 



Gnedenko-Pickands-Balkema-de Haan theorem 

Suppose F(x) is a DF with excess distribution F u (x), u > 0. Then, for - °° < £, < +°°, 
F(x) g MDA(H^) if and only if there exists a positive function s(u) such that 

(2) lim sup I F u (x)- G(x/£,s(u)) I =0. 

uTx < x < x - u 
F F 

Here and further on, F (x) denotes the tail of the DF F(x) : F (x) = 1 - F(x). Other names for F (x) 
are the "complementary cumulative" distribution or "survivor" function. Intuitively, the statement (2) 

means that the tail F (x) of the distribution is asymptotically given by the GPD defined by (lb) as x 
approaches the very end of the tail of the distribution. The strength of the G-P-B-H theorem is that it 
is not a statement only on the largest value of a data set, as is the case for the Extreme Value Theory 
leading to the limit distributions (la). It has indeed been shown [Knopoff and Kagan, 1977] that 
using Extreme Value Theory to constraint the shape of the tail of the distribution is sub-optimal as 
one effectively discards a significant part of the data which leads to unreliable results. In contrast, the 
analysis of the tail provided by the G-P-B-H theorem makes full use of all the data present in the tail. 

Let us denote by N u the number of observations exceeding a threshold u and by yi,...,ym the 
observations decreased by u : y, ■ = Xj® - u,; Xj(g > u. The G-P-B-H theorem yields an approximation to 

the tail F (x) by a GPD as a tail estimator: 



F(x + u) = G (x/|,S)x(N u /N). 
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The estimates of the two parameters <^ , s can be obtained through the Maximum Likelihood 
estimation (ML). The log-likelihood / equals 

(3) / = -N u lns- + ln(l+£ yi /s). 



Maximization of the log-likelihood / can be done numerically. The limit standard deviations of ML- 
estimates asN^«> can be easily obtained [Embrechts et ah, 1997]: 

(4) a 6 = (1+^/VaT; a s = s ^2(1 + 

In practice, one usually replaces the unknown parameters in equation (4) by their estimates. One can 
also calculate the q-quantile estimator: 

x q = u + ( N/N u (l-q) "I - 1), 

which is the value of the random variable not overpassed with probability q. 

It should be noted that the scale parameter s = s(u) depends on the threshold u, while the shape 
parameter £, is in theory independent of u and solely determined by the DF F(x) of the data points. 
Thus, one can hope to find a reasonable GPD fit to the tail if it is possible to take a sufficiently high 
threshold u and to keep a sufficiently large number of excesses over it. Of course, it is not always 
possible. 

The shape parameter £, is of great interest in the analysis of the tails. When x becomes large 
and £, > 0, the tail of the DF in equation ( 1 ) approaches a power function 

G (x/is) = (^x/s)- 1 ^ . 

l/£, is therefore the exponent of the survivor distribution function. It corresponds asymptotically to 
the exponent /3 for the Pareto law. Thus, the GPD is asymptotically scale invariant. The 
parameterization of the distribution tails by l/£, is more appropriate from a statistical point of view. 

In the sequel, we apply this GPD-approach to the distribution of seismic moments M 
characterizing energy release of earthquakes. In this case, the slope b of the Gutenberg-Richter 
magnitude-frequency law is approximately proportional to the exponent l/£, , with a coefficient of 
proportionality usually taken equal to 2/3 : b = 3/(2^) . 



APPLICATION OF THE GPD TO THE ANALYSIS OF SEISMIC ENERGY 
DISTRIBUTION 

Seismic catalog, regionalization 

We used the Harvard catalog of seismic moments covering the period 01.01.1977-31.05.2000 
[Dziewonski et ah, 1994]. Since the distribution of earthquake energy for deep events differ 
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significantly from that of the shallow ones [Kagan, 1997], we restrict our analysis to shallow 
earthquakes with focal depth h < 70 km. Such events constitute about 75% of the catalog. 

According to the G-P-B-H theorem, the GPD can be used under the condition that the threshold 
u is large enough. We have tried several values of thresholds and found that for u = 10 23 dyne-cm (the 
lower limit of complete registration of the Harvard catalog) the approximation is rather poor. A 
reasonable quality for the approximations of tails is obtained for thresholds equal to or above u = 10 24 
dyne-cm. We thus chose the common threshold u = 1 24 dyne-cm for all regions. Recall that 1 dyne- 
cm = 10" 7 N-m, thus the threshold u = 10 24 dyne-cm= u = 10 17 N-m corresponds to a moment- 
magnitude m w =(2/3) log u - 6 = 5.33. 

It should be noted that, although our cut-off of 10 24 dyne-cm does not ensure perfect 
completeness of the catalog in the whole time period 1977-2000 (such completeness can be 
guaranteed only after 1983), the fraction of under-sampled events in the range M> 10 24 in 1977-2000 
does not exceed 2% according to our estimation. Therefore, the lower threshold M = 1 24 dyne-cm is 
admissible in particular for the study of the tail performed here. Independent estimations of 
completeness show that the Harvard catalog for the whole earth since 1977 is complete for M>10 24 45 
dyne-cm, i.e., m w >5.63. Starting from 1983, we find that the catalog is complete above our threshold 
10 24 dyne -cm, i.e. above m w =5.33. These results are consistent with the estimations of Kagan (1999) 
and Bird et al. (2000). 

We did not exclude aftershocks from the catalog for the two following reasons. First, since we 
analyze the largest events with seismic moments M > 10 24 dyne-cm, the fraction of aftershocks in this 
range of energies is quite low. Second, any procedure of aftershock removal is more or less arbitrary 
and may introduce a bias in the estimation of the parameters. 

One of the applications of our analysis is the investigation of the geographic variability of the 
parameters of the GPD. Detailed studies of the spatial variability of seismic parameters can be found 
in [Kronrod, 1984; Cornell, 1994; Kagan, 1997]. The worldwide seismicity is usually studied using 
the Flinn-Engdahl regionalization or some of its modifications are used [Flinn et al., 191 '4; Kronrod, 
1984; Young et al., 1996]. This regionalization contains 50 regions and is too detailed for our 
analysis: some regions are too small and contain only 20-30 events, which is certainly not enough for 
an analysis of the tail of the distribution. Therefore, we used two versions of the regionalization with 
larger regions. In the first one, we have regrouped all Flinn-Engdahl regions into 1 1 clusters (zones) 
containing on the average 350 events per zone with seismic moments M > 10 24 dyne-cm. In the 
construction of these zones, we considered the main characteristics of each seismo-tectonic regimes: 
subduction (7 zones), mid-ocean ridges (3 zones), collision or intracontinental regions (1 zone). This 
version of regionalization can be called "formal". In the second version, we used a geophysical 
approach. We tried to delineate zones in such a way that centroid-moment tensor solution would 
correspond to the current plate motion at the particular place (see the model NUVEL-la, DeMets et 
al., 1990, DeMets et al., 1994). One of the axes of principal stress (compression or dilatation) was 
close in this case to the vector of relative displacements of the plates. Thus, we have selected 1 1 
subduction zones (4946 earthquakes with M > 10 24 dyne-cm), 3 midocean ridge zones (1251 
earthquakes with M > 10 24 dyne-cm), 3 shear zones (282 earthquakes with M > 10 24 dyne-cm), and 
one continental collision zone (526 earthquakes with M > 10 24 dyne-cm). These 18 zones are shown 
in Fig. 1 . This version of regionalization will be referred to as "geophysical". We stress that this 
regionalization was performed before the statistical analysis and was fixed throughout the analysis, in 
order to avoid any possible bias. 

Further detailed analysis of the seismic energy distribution was only performed on two 
representative zones, one subduction zone (SZ) and one midocean ridge zone (MORZ). The results of 
this analysis were found quite similar for both versions of the regionalization. We can thus conclude 
that the tail of the distribution of seismic energy is not strongly affected by a particular method of 
regionalization. For this reason, we describe below in detail only the results of the analysis performed 
on the "geophysical" regionalization. 

It is known that earthquakes in MORZ can be subdivided further into two subclasses. Events of 
the first class occur in spreading segments of a ridge, relative plate velocity ranging from several 
mm/y up to 150 mm/y. Events of the second class are located in vicinity of transform faults. We 
intend to perform such detailed analysis of earthquakes in MORZ elsewhere confining ourselves here 
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to the comparison of the two general types of zones: subduction and midocean ridges (cf. Kagan, 
1997). 



Estimation of the GPD-parameters 

The estimates of the shape parameters for 1 1 SZ and 3 MORZ are given in Table 1 . The GPD- 
fits for 6 SZ and for 2 MORZ are illustrated by Fig.2a-2h. In the fourth column of Table 1 , the % 2 - 
values for the GPD are shown (the corresponding number k of degrees of freedom and probabilities 
of excess e are given in parentheses). In the X 2 -test, we chose the number of discrete intervals k in 
such a way that the number of observations in each interval was at least 8-10 as it is typically 
recommended. The number of degrees of freedom indicated in parentheses of the fourth column of 
Table 1 equals (k - 3) since two parameters were estimated and one degree of freedom is lost because 
of normalizing restriction on sum of probabilities. 

The excess probability e for a given fit is defined as the probability that ^-values equal to or 
larger than the one obtained by the GPD fit can be obtained due to "natural" noisy fluctuations 
decorating the data. A small e signals an anomalous large x 2 -value which is difficult to explain by 
normal noise and thus indicates that the model using the GPD is inadequate to account for the data. 
Thus, the hypothesis that the GPD accounts for the data is rejected only if e is very small, say, less 
than 0.05 corresponding to the so-called 95% confidence level. We read in the fourth column of table 
1 that the excess probabilities e for all the zones are larger than 0.26. Thus, on the whole, the GPD 
approximates quite satisfactorily the tails of the sample distributions in the different zones, in the 
range M > 10 24 dyne-cm. Sometimes, deviations from the GPD for the very largest values can be 
noticed visually (see e.g. Fig.2d-2e). However, the sensitivity of the % 2 -test to such deviations is 
obviously not sufficient to account for these visual observations. It is indeed well known in statistical 
theory that the X 2 -test is rather insensitive, in particular for detailed estimation of the tail behavior. 
One possible origin for this lack of sensitivity is the scarceness of the number of events in individual 
zones which does not reveal such single outliers. Below, we propose a new method for the statistical 
analysis of the tail behavior which is much more sensitive to possible deviations between the sample 
tails and the fitted distribution. 

Table 1 demonstrates a significant difference in the estimates of the shape parameter £ between 
SZ and MORZ, which was noticed earlier [Okal and Romanowicz, 1994; Kagan, 1997, 1999]. The 
statistical significance of this difference is so large that there is no doubt on its reality. Indeed, the 
estimate of £, for the sample, which includes all 1 1 SZ taken together, is 

<§>sz= 1.51 ±.036, 
whereas the estimate for the sample uniting all 3 MORZ is 

<£>morz= 1.02 ±.057. 
The normalized difference equals 



(<^> sz - <^> M0 Rz)/((a S z) 2 + (Omorz) 2 ) 1 ' 2 = 7.26 
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which corresponds to the Gaussian probability of excess £ s 2xl0" 13 (note that fluctuations of l/£, 
are asymptotically Gaussian (Sornette et al., 1996; Sornette, 2000)). Thus, the difference between the 
estimate of £, for the SZ and MORZ has a very high level of significance. We observe some 
differences in the estimates of £, among the 1 1 subduction zones, as well as among the 3 ridge zones 
but they are not so pronounced as for the difference between the SZ and MORZ and do not show 
statistical significance. 

This strong and significant difference in the shape parameter t, between the SZ and MORZ could 
be attributed to the well-known difference in their seismicity level: the seismic energy dissipated in 
SZ is incomparably higher than that in MORZ. In order to check this hypothesis, we tried another 
method for estimating the shape parameter £, based on samples whose size was normalized by the 
level of seismicity. In each zone, the threshold u was therefore chosen so that the corresponding 
sample size N u would be equal to 75% of its original size N. This ensures that the estimation of £, is 
based on the same fraction of the data in the tail distribution for all zones. The results of this method 
of estimation are shown in column 5 of Table 1 . We see that the estimates of £, do not change 
significantly. The strong difference between the averages of £, for SZ and MORZ remains the same 
(7.48 vs 7.26 for the first method of estimation). Thus, the significant difference in the shape 
parameter £, in the range M > 10 24 dyne-cm between SZ and MORZ is not a bias due to differences in 
seismicity level and is confirmed. 

In the discussion section at the end, we propose a physical mechanism to explain the origin of 
the difference between the slope value (b = 3/(2^) = 1.47±0.08) in midocean ridges compared to 
subduction zones (b = 3/(22;) = 0.99 ±0.03), which contrasts ruptures on faults remaining weak in the 
ocean ridges with rupture on faults healing fast in the other zones. In a recent analysis, Bird et al. 
(2000) explain the difference of b-value found earlier (Okal and Romanovicz, 1994; Kagan, 1997, 

1999) and confirmed here, on the basis that an effective larger b-value will be found when mixing 
power law distribution with different "corner" magnitudes. Indeed, a similar effect has been shown to 
increase the b-value in a population of faults with a continuous power law distribution of lengths 
(Sornette et ah, 1991; Lomnitz-Adler, 1992). 

We have checked this possibility specifically in the present context with negative conclusions. 
The difference with (Sornette et al., 1991; Lomnitz-Adler, 1992) is that there are only two populations 
of earthquakes and not a continuum. We proceed as follows. The same distribution as in (Bird et al., 

2000) , namely the Modified Gutenberg-Richter law (MGR), is considered: 

(5) F(x) = 1 - (M t /x) f3 exp((M t -x)/M c ), x > M t , 

where M, is the lower cutoff, (3=(2/3)b is the slope parameter of the Gutenberg-Richter law; M c is so- 
called "corner seismic moment". This distribution differs from the classical Gutenberg-Richter law 
by the exponential taper in equation (5). According to (Bird et al. ,2000), oceanic earthquakes are 
subdivided into two main subclasses: "spreading" events, and "transform fault" events, having the 
same (3-values ((3 = 2/3) but different corner moments: M ic and M 2c correspondingly (M ic = 10 24 73 
dyne-cm which corresponds to the magnitude mi c = 5.82 and M 2c = 10 26 35 dyne-cm which 
corresponds to the magnitude m 2c = 6.9). The seismic energy flux of transform earthquakes is much 
higher than the flux of spreading ones (approximately 17 times higher). Transform earthquakes 
constitute about 70% of all oceanic events. 

In order to measure quantitatively a possible effect of mixing these two types of events on estimates 
of the slope of distribution tails, we have constructed artificial samples obeying a distribution with 
the following tail chosen to capture as closely as possible the model of (Bird et al. ,2000): 



(6) F (x) = (M t /x) p [ 0.3 exp((M t -x)/M lc ) + 0.7 exp((M t -x)/M 2c ) ] 

The sample size was chosen with N = 1251, the same as in all 3 MORZ of our catalog. The lower 
cutoff M t was taken 1 24 dyne-cm. We applied the method suggested by (Bird et al. ,2000) of 
estimating the GPD parameters both to samples obeying the mixed distribution (6) and to the separate 
components corresponding to the two corner moments Mi c and M 2c . We have carried out a dozen of 



numerical experiments with different random samples and got the following estimates of the 
form parameter of the GPD: 

£i = 0.31 -0.44; fc w = 1.03 + 1.21; 1.22 + 1.32, 

where corresponds to corner moment M ic (I = 1,2), and ^ w corresponds to the mixture (6). 
Transforming the ^-estimates into the asymptotic slope parameter b by the formula b = 3/(2^) we get: 

bi = 3.41 +4.84; b w = 1.24 + 1.46; b 2 = 1.14 + 1.23 . 

We can conclude that the ^-estimates of a mixture are always within the interval formed by the 
estimates of the distinct components. Thus, mixing distributions with different corner moments can 
not result in increasing slope parameter (3 or b. 

There is another problem with Bird et al. (2000) explanation of the difference of b-values by 
mixtures of power law distributions with different "corner" magnitudes. Indeed, the "corner 
magnitude" M c (which is the crucial parameter in their model of seismic catalogs) depends heavily on 
the lower cutoff M t and is thus unreliable, due to the fact that for different M t the exponential taper 
exp((M t -x)/M c ) starts to act at different places, namely at M= M t . Therefore, for different M t , the 
taper has different "distances" to the "bent down", which should then be compensated by different 
M c . The value of M c is thus directly controlled by the choice of the lower cut-off M, and is thus not 
only a characteristic of the "bent down" but also of the lower cut-off M t . Our detailed analysis of the 
very end of the tails of the distribution presented in the next section shows that an a priori choice of a 
distribution can not be warranted and that there are still too few large events to draw any conclusion. 
In addition, imposing an exponential taper may have a dangerous impact on the determination of the 
b-value itself, that may not be obvious but results, as we have stressed, from the competition between 
the b-value and the corner moment, especially in the relatively small available data sets. This shows 
that a purely statistical approach using only the available data has reached its limit and further 
progress, bearing waiting several more decades, can only be achieved by adding geological and 
physical mechanisms to bear on the question, as we attempt in the discussion. 

Detailed analysis of the ends of tail histograms 

We already noted above that at the very end of the tail histograms, some deviations from the 
GPD may occur (see e.g. Fig.2d-2e). We are now going to analyze these deviations in detail. This 
phenomenon is another manifestation of the well known "bent down" of the Gutenberg-Richter 
magnitude-frequency law in the range of high magnitudes, which was widely discussed in the 
literature [Main and Burton, 1984; Okal and Romanowicz, 1994; Pacheco et al. 1992; Kagan, 1994, 
1997, 1999; Sornette and Sornette, 1999; Molchan et al, 1996, 1997]. A possible explanation for this 
"bent down" is the following. Ruptures of very large earthquakes break the whole thickness of the 
seismogenic layer. Thus, if magnitude exceeds some threshold, say mo = 7.5, then the source 
dimensions for events with m > m can only increase in "length" and "width" but not in depth (the 
source "loses one dimension"). Thus, some discrepancy in the frequency of occurrence of magnitudes 
exceeding m may be expected. However, it is not clear a priori if this discrepancy must be a "bent 
down" or a "bent up" as this has to be decided by the physics of dynamical rupture and the way large 
earthquakes develop as a function of space dimension. A more general argument in favor of the 
ultimate eventual "bent down" involves the concept of energy conservation. Since the earth is finite, 
the extrapolation of the GPD with an exponent l/£, less than 1, which predicts an infinite rate of 
energy release, is bound to fail and to transition to a faster decay at very large moments, hence an 
ultimate "bent down". However, this ultimate "bent down" could occur at even higher moments, 
possibly beyond a bent-down or bent-up induced by the finite size of the seismogenic layers. 

On the other hand, the G-P-B-H theorem assumes a regular behavior of the DF at infinity. It is 
clear that such regular behavior cannot be observed in any real finite data. Therefore, even if the GPD 
approximates data in some range quite well, this approximation may not extend beyond some limit. 
The question is to find and estimate this limit as precisely as possible. Thus, one can expect that even 
if the tail histogram is approximated quite well in a moderate range, it might have "a change point" 
somewhere near the right end of the range. By visually inspecting the tail histograms shown in Fig.2, 
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one can suppose the existence of such "change point" in r4 (Taiwan), r6 (Solomon Isls), (see 
Fig.2d-2e). But of course, a strict statistical test is needed to check the significance of these 
deviations. 

In order to increase the sample size, it is desirable to put together the events of several 
subduction zones into a single sample (in contrast, the mixture of SZ and MORZ is inadmissible 
since they differ significantly in the shape parameter). As we have already said, based on our 
statistical analysis, it is not possible to reject the hypothesis of homogeneity of all SZ. Indeed, we 
applied the % 2 -test of homogeneity to the SZ. It did not reveal significant heterogeneity, although 
some difference in the distributions among SZ exists (the excess probability e is 0.07, which did not 
allow us to reject the hypothesis of homogeneity with the usual 95% confidence level; however, the 
hypothesis of homogeneity would be rejected at the 90% confidence level). Besides, we had some 
geophysical grounds to unite all 1 1 SZ into one large sample (N = 4985, M > 10 24 dyne-cm), since 
they have similar tectonic characteristics. The tail histogram of this united SZ sample is shown in 
Fig.3 together with the fitted GPD. The same Figure shows the tail histogram of the MORZ united 
sample (N = 1251, M > 10 24 dyne-cm) in order to demonstrate once more the significant difference of 
distributions between SZ and MORZ. 

It is clear from Fig.3 that the tail of the distribution of moments in the SZ contains about 20 
extreme observations that deviate (visually) from the GPD curve. A "change point" occurs apparently 
somewhere near M = 5xl0 27 dyne-cm (magnitude m w = 7.8), whereas the MORZ tail has such a point 
approximately at M = 1.5xl0 26 dyne-cm (magnitude m w s 6.8). We will now consider the problem of 
the statistical detection of this "change point". 

We suggest a new method for the statistical treatment of this problem. This method is based on 
the bootstrap approach [Efron and Tibshirani, 1986]. The details of the method are given in the 
Appendix. The problem is divided into two parts. The first one consists in the statistical testing of the 
null hypothesis H that the GPD is valid in the whole range covered by the sample. The second part 
includes the estimation of the "change point" if H is rejected. 

The statistical test of the hypothesis H is constructed as described in details in the Appendix. 
We first perform a transformation from the extreme values yi>. . >yn into the tail distribution 
F (yi)<. ...< F (yi). This transformation converts variables which vary extremely wildly (with a 
power law tail with exponent smaller than 1) into variables with much more manageable fluctuations 
which are uniform in the interval [0,1]. We keep the r first largest values of y and thus the r smallest 
values of F . We then normalize the r extreme tail values F as described in the Appendix by their 
ML mean and standard deviation and take the sum of their squares S r as a measure of the deviation of 
the sample from the GPD. Then, we transform S r into a dimensionless statistic e r : 

e r = T(r/2; Sr), 

where T(a, x) is the incomplete Gamma function. This transformation makes it possible to compare 
the significance of the deviations for different values of r. The smaller e r , the less probable is the 
hypothesis H . 

If the normalized deviations were standard independent Gaussian random values, then e r would 
give the probability of exceeding the value S r under the hypothesis H . For non-Gaussian values with 
finite variance (as is the case here with the statistics of the F ), we can estimate with any desired 
accuracy the statistical significance using the bootstrap method. We can then optimize the choice of r 
by minimizing the value (e r ) over r. Thus, the final decision statistics is 

^mi„ = min (£ r ) • 



The distribution of the statistic e r is estimated by the bootstrap method. We used in this estimating 
procedure 1000-4000 random trials with the parameters of the GPD fixed at their maximum 
likelihood estimates. Thus, we estimated the probability e of the random statistic <5 min under 
hypothesis H to be less than the observed sample value of min(e r ). The smaller the probability e, 

r 

the less probable is the hypothesis H . The results of our application of the proposed technique to our 
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catalogs are shown in Table 2. We see that the decision statistic is significantly small only for 
three regions: r4 (Taiwan), r6 (Solomon Isls), and rl4 (Indian ridge), which coincide with the visual 
observations discussed above. 

The deviation for rl (Alaska), r8 (Mexico) and rl2 (Pacific ridge) are at the limit of significance. 
A very distinct positive deviation is obtained both for the sample including all 1 1 SZ and for the 
sample including the 3 MORZ. But even in the most favorable case of the SZ, the total number of 
estimated deviating extreme events is 17, whereas in the other cases, this number is even smaller. 

If the hypothesis H is rejected, then the "change point" k*, defined as the order of the rank of 
the ordered moments at which the transition occurs, can be estimated by the maximum likelihood 
method (see the Appendix for details). The results of this estimation are shown in the last column of 
Table 2. 

The normalized deviations of the tail values for several regions are presented in Fig.4a-4g. 
Comparing these deviations with the non-normalized ones shown in Fig.2, it is clear that our 
proposed normalization makes much more apparent the significance of the deviations. 

The normalized deviations of the largest event p N (see equation (2A)) for r5 (Mariana Isls) and 
rlO (South Sandwich Isls) are negative. It means that these maxima exceed the values predicted by 
the GPD. Instead of the "bent down", they demonstrate a "bent up". Sometimes such outliers are 
observed in real data {Laherrere and Sornette, 1998). They can be called "king size" values or 
"outliers" in the sense defined in the introduction, i.e. extreme exceptions not obeying a general rule. 
Thus, our method makes it possible to estimate the statistical significance of the "king size" 
deviations. 

In order to describe a functional form of such deviations, one can use different parametric 
families. As we already said, several families were used in similar situations in order to modify the 
Guteberg-Richter law at the end of magnitude range: Gamma distributions [Main and Burton, 1984; 
Kagan, 1994, 1997, 1999], Pareto distributions with a crossover point [Sornette et ah, 1996], Weibull 
distributions [Laherrere and Sornette, 1998]. Although the catalogs used in those works contained 
several hundreds of events (or even more), the number of observations that are essential for the end 
estimation were in fact comparable to that of our study (equal to 17). Such a small number is 
insufficient for a reliable estimation of any functional form of the tail. In order to demonstrate this 
statement, we consider a synthetic example with the parameters taken from the real data. We consider 
the following tail function: 

Gi(x/^s,); 

(7) F(x) = 

G 2 (x / £ 2 , s 2 ); 

Here, Gi(x / £1, si) is a known function which is chosen for defmiteness as a GPD with some 
parameters £1, Si . We choose G 2 (x / ^ 2 , s 2 ) as a pure Pareto tail: 

G 2 (x/^ 2 ,s 2 ) = (s 2 /x) 1/ ^ 2 . 

The parameters ^1, Si are supposed to be known, whereas £2, s 2 are unknown. The parameter s 2 is 
chosen such that the function F(x) is continuous at x = A : 

s 2 = s 2 &) = Ax(F (A/^i, Si) )\ ■ 

Thus, ^ 2 remains as a unique free parameter. Assuming that Na is fixed (A = 5.26xl0 27 , i.e. m w =7.8, 
N A = 17), we obtain a standard problem for the estimation of ^ 2 with 17 observations exceeding the 



1 < x < A 



x > A . 



threshold 5.26x10 . We fitted the Pareto distribution to these 17 observations and found the 
ML-estimate of t,2 and its variance: 

(8) ^ 2 = (l/N A )x]T In (x/ A) = .642; (a 2 ) 2 = fe) 2 / N A = (0.156) 2 . 

1 

The 90%-confidence interval for ^ 2 is thus [0.384; 0.894]. Inserting the boundary values 0.384; 

0.894 into the tail function (7), we obtain the following extreme values of the estimated tail of the 
distribution function at x = X max = 3.1xl0 28 : 

G 2 (X max / I , s 2 (|")) = 5.92X10" 4 ; G 2 (X max / | , s 2 (|)) = 4.26xl0" 5 . 

We see that, even when having assumed a known parametric family, we obtained a very uncertain 
result in the estimation of the tail probabilities: the ratio of the values of the tail distributions 
corresponding to the extreme values at the 90%-confidence level for the parameter £ is 14. The 

standard deviation 0.156 of £2 should be compared with the standard deviation 0.036 of the 
parameter £ estimated over the whole range M > 10 24 (N = 4985). This fourfold difference does not 
mean that we should believe more the estimation of the large data set over the whole range M > 10 24 
(N = 4985), because the corresponding estimation of the parameters of the GPD is strongly weighted 
by the large number of relatively small and intermediate events. This does not tell us what are the 
relevant parameters for the extreme values that determine the end tail characteristics. This example 
thus shows that the increase of the sample size obtained by incorporating small events can result in a 
strong underestimation of the real accuracy of the determination of the shape parameter of the end 
tail. Similar uncertain estimations should occur for the Gamma and Weibull distributions. For the 
Gamma distribution, the uncertainty is on the cross-over value, which is also completely controlled 
by the few observed extreme values. There is no issue to this problem: increasing the statistics gives 
more weight to smaller events which thus bias the determination of the exponent which cannot be 
extrapolated in the very far tail; decreasing the statistics to put full emphasis on the extreme tail gives 
very little data and thus a very large statistical uncertainty. This conclusion is not new but our 
analysis may have made it hopefully clearer and more quantitative. 

Fig. 5 compares the fit by three different parametric families which all look quite satisfactory. 
These three families are the GPD with the crossover point at x = 5.26x1 27 , the Weibull, and the 
Gamma DF. The GPD and Gamma approximations are practically indistinguishable in the range 
shown here and are very close to the histogram, whereas the Weibull DF is slightly worse. These 
three DF approximate the SZ-catalog tail with almost equal efficiency and none can be preferred. 

DISCUSSION AND CONCLUSIONS 

We have proposed to represent the extreme values of a given data set by the Generalized Pareto 
Distribution (GPD), which is a universal description of the tail of distributions of Peaks-Over- 
Thresholds. In many applications, this approach is justified since extreme observations are often of 
primary importance, e.g. in hydrology, seismic hazard and seismic risk assessments, insurance, etc. 
In the seismological application discussed here, this approach may appear somewhat restrictive 
because it covers only a part of the seismic moment range. Nevertheless, it has two advantages. First, 
it is based on the fundamental Gnedenko-Pickands-Balkema-de Haan (G-P-B-H) limit theorem of 
probability theory. Second, its parametric form is more natural and flexible when compared to other 
parametric families. The confidence interval for the shape parameter \ allows us to judge at once the 
uncertainty of the estimation. If the confidence interval of a high level includes both positive and 
negative values, then all three types of tails are possible: power-like, exponential and truncated tail. 
On the contrary, if the confidence interval contains only positive (only negative) values then a power- 
like tail (respectively, a truncated tail) is the most probable. The use of the G-P-B-H theorem that we 
have introduced in the present paper thus allows for a continuous comparison of distribution tails 
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ranging from a wide range of possibilities. In particular, a value of £, close to zero would signal 
an exponential distribution (see below). 

In choosing the lower threshold u, two contradictory requirements should be reconciled: 1) u has 
to be large enough to ensure applicability of the G-P-B-H theorem, and 2) a sufficiently big number 
of Peak-over-Threshold values should be kept in order to provide a reliable statistical estimation of 
the parameters. Of course, this trade-off is not always possible. 

What inference about the parameters of the seismological problem results from our GPD 
approach? Our conclusions refer only to the range of large events M > 10 24 dyne-cm with depth h < 
70 km. On the whole, the GPD provides quite satisfactory approximations of the tails of the 
distributions of the seismic energy released by earthquakes. Regional differences in the parameters 
between SZ are not so strong, at least for large shallow events. The GPD can thus be used in many 
problems of seismic risk assessment on a regional scale. 

The already documented observation that the slope parameter b of the Gutenberg-Richter law for 
shallow events is significantly smaller for subduction zones compared to mid-ocean ridges [Okal and 
Romanovicz, 1994; Kagan, 1997, 1999] is confirmed with respect to the tail behavior. Neither a 
statistical scatter nor a lower seismic flux of MORZ can mask this difference. An accepted 
geophysical explanation to this effect is still to be found. We offer the following conjecture. First, we 
note that the exponent l/£, is approximately 1 for MORZ (b-value = 1.5) compared to 2/3 for 
subduction zones (b-value = 1). Such a value l/£, = 1 of the exponent is found in many contexts in the 
distribution of energies associated with acoustic emission records of heterogeneous materials brought 
to rupture (Pollock, 1989; Omeltchenko et al., 1997; Fineberg and Marder, 1999; Lei et al., 2000) as 
well as in a model of self-organizing fault networks with faults interacting through long range 
elasticity, which heal slowly or not at all after an earthquake (Sornette and Vanneste, 1996). The 
value 1/^=1 thus seems characteristic of events controlled by faults remaining weak or "open", for a 
long time compared to the recurrence time between large events. In contrast, when faults are modeled 
as "dislocations", with increasing slip but healing stresses, the exponent is found smaller (Sornette et 
al., 1994.). It is thus tempting to associate the large value l/£, = 1 of MORZ earthquakes (at least for 
the transform earthquakes constituting the most numerous and powerful fraction of all oceanic 
events) with the largely extensional stress configuration and the presence of abundant water, which 
are both favorable for faults to remain weak and open. In contrast, the smaller value Xft, = 2/3 found 
for subduction zones could be interpreted as the signature of fast healing faults with a larger 
compressional component of stress. 

Even if the GPD works well in the intermediate range of data, there is always the possibility 
that, at the extreme end of the range of sizes, some deviation from the GPD may occur. Since the use 
of the GPD is warranted asymptotically according to the G-P-B-H theorem, such a deviation would 
signal a change of physics and the existence of new mechanisms. In the case of earthquake catalogs, 
such a deviation is usually related to the finite thickness of the seismogenic layers (although no direct 
evidence of this statement is demonstrated in the existing literature (Sornette et al., 1996; Main, 
2000)). Indeed, such deviations were observed in some regions, but the strongest deviation was found 
for the whole SZ catalog, probably because it contains much more events than regional catalogs. A 
special statistical test applied to the SZ catalog singled out 1 7 largest events deviating significantly 
from the GPD tail. The strong statistical significance of our tests justifies the quest for a parametric 
representation of these deviations (Kagan and Schoenberg, 2000). However, such a number of events 
is not sufficient to establish any functional form of this deviation. Several parametric families of 
distributions were fitted to the 1 7 largest events of the SZ catalog. The quality of the fits with these 
families was almost the same. An even more uncertain situation occurs for separate regional catalogs 
because of the smaller number of observations. Perhaps future additional observations will permit to 
answer the question of the functional form of the distribution of extremes more definitely. 

The uncertain situation at the end of the tails of distribution functions (DF) seems to be a general 
characteristic that applies beyond the evidence documented here for the seismic catalogs. Fig.6a 
shows the results of the fit with the GPD to a financial data set (see in [Laherrere and Sornette, 1998] 
for a description of the data set). The estimate of the form parameter of GPD with threshold u = .0001 
was the following: 

| = .0119±.0227(N=1977). 
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Fig.6b shows the results of the fit with the GPD to a data set characterizing the distribution of 
scientific citations set catalogued by the Institute for Scientific Information (see in [Laherrere and 
Sornette, 1998] for a description of the data set). In this case we got: 

| = .221 ± .036 (u = 2327, N=l 120). 

Note that for the financial data £, is not found to be statistically undistinguishable from 0, 
corresponding to an approximate exponential distribution whereas for citations the form parameter is 
positive with a high certainty, although the asymptotical power of the tail l/£, is rather high 

1/| =4.51 ±.74 

This exponent is higher that the exponent approximately equal to 3 (Redner, 1998) of the distribution 
of citations per paper (in contrast with the citations per author analyzed in figure 6b). In contrast, both 
the financial data set shown in figure 6a and the citation data set shown in figure 6b were fitted with a 
Weibull (or stretched exponential) distribution in [Laherrere and Sornette, 1998]. Again, we see 
quite a good approximation in the intermediate range, which stresses the difficulty in distinguishing 
between different fat-tail distributions. However, a few significant deviations can be observed at the 
very end of the range. Thus, the detection of extreme deviations from the GPD might have a general 
meaning. In the case of financial returns, this was recently documented with a different techniques 
using "draw downs" or "runs" that allow for a flexible time scale to adapt to the development of local 
correlations [Johansen and Sornette, 1998; 2000]. 

What statistical recommendations can be suggested concerning the seismic hazard (seismic risk) 
assessment and on related problems? Of course, when the sample size is small, no statistical method 
can help in a definitive way, but still some cautionary measures can be recommended. First of all, it is 
desirable to use several competing models in the tail estimation and to compare them. Their 
application can help judging the uncertainty of the statistical inference. By inserting the extreme 
parameter values of the confidence domain into a fitted tail function, one can compare the resulting 
difference of probabilities. The bootstrap approach can be very useful in this situation. Statistical 
estimation or hypothesis testing are easily modeled by the bootstrap method even for small samples. 
Sometimes, generating artificial samples and simple visual inspection can help in drawing 
conclusions. On the whole, the application of the GPD to the problem of seismic energy distribution 
and to other problems seems quite promising and deserves further development. 
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APPENDIX 



Testing deviations from GPD at the end of the distribution tail and estimation of the 
"change point". 

Our null hypothesis H is that the Generalized Pareto Distribution (GPD) adequately models the 
full range of the data. The statistical testing of the hypothesis H is done as follows. Suppose a ranked 
sample 

yj> ... >y N 

is drawn from a GPD-population with some parameters s . The values 

h = F (yi I s) , t N = F (y N /^s) 

are then distributed as N ranked random values taken from a uniform distribution on the interval 
[0,1]. Recall that F (x) is the tail distribution also known as the complementary cumulative 
distribution or survivor function: it gives the probability to exceed x. 

The mean value and the variance of tj are well known [Hajek and Sidak, 1967]: 

(1A) E tj = j / (N+l) ; Var tj = j (N - j +1) / (N +l) 2 (N+2). 

In order to construct a statistical test, we normalize the deviations tj : 

(2A) Pj = (tj - E^)/(Var tj ) 1/2 . 

The 100 largest values Pj for SZ and for several zones are shown in Fig. 5. If H is false, then some 
of the values pj should strongly deviate from zero. As a measure of this deviation we take the 
cumulative sums S r : 

(3A) S r = ( Pl ) 2 + ... + (p r ) 2 ,r=l,... 

Now we transform S r into the following statistics e r : 
(4A) e r = r(r/2,S r ),r=l,... 

where T( a, x) is the incomplete Gamma function. The statistics e r allows us to compare the 
significance of the deviation from the GPD for different values of r (if pj were standard Gaussian 
independent random values, then e r would equal exactly to the probabilities of excess for % 2 - 
distribution with r degrees of freedom). In fact, Pj are neither Gaussian nor independent, but it does 
not affect our further bootstrap estimation. Indeed, the transformations from the variables y to t and 
then to p ensure that the fluctuations which were initially of the power law type become "mild" with 
finite variance, thus ensuring a good statistical control. 

Finally, we calculate the decision statistic 8 m i n : 

(5 A) 5 min = min(£ r ) 

r 

The maximization is taken over 1 < r < n, where n is some a priori chosen number covering the 
interval of possible deviations from the GPD. In our applications, it was sufficient to take n = 20. The 
less 8 m i n is, the less probable is the hypothesis H . In practice, we do not know the exact values of the 



parameters and s. Therefore, we calculate the sample decision statistic <5 min through equations 
(1A)-(5A) using the sample values 

fj=F(V|,S), 

where | , s are ML-estimates of the parameters %, s. 

The distribution of the statistic <5 min is estimated as follows. ML-estimates |, 5 are fixed as true 

parameters. Then, L random samples of size N obeying the GPD with parameters J , 5 , are 
generated: 

x (1> = x/'> ...x N (1> 
x (L >=x 1 (L >...x N (L> 

For each sample x (l) , ML-estimates s (l) are determined. Then, the values pj (l) are obtained 
through the equations (1A-2A). Finally, we calculate 8 m j n (l) through equations (3A-5A). Taking L 
sufficiently large, one can estimate numerically with the desired accuracy the probability e = P 
{§min W < <5 min } characterizing the plausibility of the hypothesis H . If £ is small, then H is 
unlikely. 

Suppose the hypothesis H is rejected. The next step is then to estimate the "change point" in 
the ordered sample 

(6A) p <:...< p N , p = - E^)/(Var tj ) 1/2 , 

i.e. to find the index k such that for j < k the sample obeys the GPD and for k < j < N, it deviates 
from the GPD. This formulation is based on the digitization of the random variable i into k intervals. 
When there are no free parameters, the Chi-square has (k-1) degrees of freedom (one degree is lost 
from the condition of normalization of probabilities). When r parameters are estimated in the fitting 
procedure, the number of degrees of freedom is (k-r-1). In our case, we estimate 2 parameters (scale 
and form). So, we have (k-3) degrees of freedom. The number k of intervals has been chosen in such 
a way that they contain approximately the probability 1/k for the GPD with the corresponding 
parameters estimated by the Maximum Likelihood method. 

The solution to the problem of estimating the change point k is known for a slightly different 
situation when p b . . ., p k _i are identically independently distributed (iid) random values with 
probability density fi(x), and pk > • • • , Pn are iid random values with probability density f2(x). In this 
case, the maximum likelihood estimate k* provides the maximum of the likelihood L [Basseville and 
Benveniste,l986; Borovkov, 1997]: 

(7 A) L = fi(pi)... fi(p k -i)f2(pk)...f 2 (p N ). 

Maximization of eq.(7A) is equivalent to the maximization of the likelihood ratio L R : 

l r = [f 2 (p k y fi(Pk)]... [f 2 (p N y fi(p N )] . 

The densities fi(x), f 2 (x) are usually unknown and should be replaced by some estimators. We use the 
structure of this estimator for the estimation of the "change point" in the series of normalized 
deviations (6A) despite the fact that p. are neither identically distributed nor are they mutually 

independent. 



Several sample parametric estimators were tried for unknown densities fi(x), f2(x): 
exponential, Gaussian, Cauchy, but the best result on the artificial samples was obtained with a robust 
and very rough approximation to the likelihood ratio f2(x)/fi(x) : 

\ C |x| >h 

f 2 (x)/f 1 (x)= | 

Lc Ixl < h 

where C, c, h are some constants adjusted using the artificial samples. In our applications, we took C 
= e = 2.718, c = e" 3 s 0.05, h = 1. 

The statistical estimator k* of the "change point" is given by the value k which maximizes 
equation (7A). Its bias and standard deviation can be estimated by the bootstrap method similarly to 
the description above. 
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